AITopics | loss curve

Collaborating Authors

loss curve

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scaling Law with Learning Rate Annealing

Neural Information Processing SystemsJun-19-2026, 04:46:34 GMT

We find that the cross-entropy loss curves of neural language models empirically adhere to a scaling law with learning rate (LR) annealing over training steps: L(s) = L0 +A S α1 C S2, where L(s)is the validation loss at step s, S1 is the area under the LR curve, S2 is the LR annealing area, and L0, A, C, αare constant parameters.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Workflow (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

39d4b545fb02556829aab1db805021c3-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 12:24:33 GMT

artificial intelligence, optical flow, variant, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.99)

Add feedback

03600ae6c3392fd65ad7c3a90c6f7ce8-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 06:24:38 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre: Research Report (0.67)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training Y anlai Y ang 1, Matt Jones

Neural Information Processing SystemsFeb-16-2026, 19:09:11 GMT

The behavior emerges and becomes more robust as the architecture scales up its number of parameters.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > Dominican Republic (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Semantic segmentation of sparse irregular point clouds for leaf/wood discrimination Yuchen Bai 1 Jean-Baptiste Durand

Neural Information Processing SystemsFeb-16-2026, 00:22:35 GMT

We also propose a loss function adapted to the severe class imbalance. We show that our model outperforms state-of-the-art alternatives on UA V point clouds.

artificial intelligence, machine learning, point cloud, (18 more...)

Neural Information Processing Systems

Country:

South America > French Guiana (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Europe > France > Occitanie > Hérault > Montpellier (0.04)
(3 more...)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

Scaling White-Box Transformers for Vision Jinrui Y ang

Neural Information Processing SystemsFeb-11-2026, 21:16:41 GMT

The most extensive model described to date is the base model size encompasses 77.6M parameters

crate, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

SupplementaryMaterial: RobustOptimalTransport withApplicationsinGenerativeModelingand DomainAdaptation 1 Proofs

Neural Information Processing SystemsFeb-9-2026, 11:03:05 GMT

Y The constraint P X,P Y Prob(X) states that P X and P Y are valid probability distributions. For brevity, we shall ignore explicitly stating it in the rest of the proof. The above equation is similar in spirit to the Kantrovich-Rubinstein duality. An important observation to note is that the above optimization only maximizes over a single discriminator function (as opposed to two functions in optimization (2)). Hence, it is easier to train it in large-scale deep learningproblemssuchasGANs.

artificial intelligence, dataset, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

In both cases we observe that the predicted curve is reasonably close to the actual curve, more so at the beginning of the training (which is expected, sincethelinearapproximation ismorelikelytohold). Point-wise similarity of predicted and observed loss curve. Up to now we focused on prediction error rates (see e.g. We started defining training time as the first time the (smoothed) loss is belowagiventhreshold(whichwethennormalizedw.r.t. In Section 4we suggest that, in the case of MSE loss, itispossible to predict the training time on alargedataset using asubset ofthesamples. However,sinceourtraining time definition measures the time to reach the asymptotic value (which is what is most useful in practice) rather than the time reach an absolute threshold, this does not affect the accuracy of the prediction(seeAppendixC).

artificial intelligence, machine learning, training time, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback